Overview

Dataset Statistics

Number of Variables 11
Number of Rows 6.3626e+06
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 1.6 GB
Average Row Size in Memory 263.4 B
Variable Types
  • Numerical: 6
  • Categorical: 5

Dataset Insights

amount is skewed Skewed
oldBalanceOrig is skewed Skewed
newBalanceOrig is skewed Skewed
oldBalanceDest is skewed Skewed
newBalanceDest is skewed Skewed
nameOrig has a high cardinality: 6353307 distinct values High Cardinality
nameDest has a high cardinality: 2722362 distinct values High Cardinality
isFraud has constant length 1 Constant Length
isFlaggedFraud has constant length 1 Constant Length
oldBalanceOrig has 2102449 (33.04%) zeros Zeros
newBalanceOrig has 3609566 (56.73%) zeros Zeros
oldBalanceDest has 2704388 (42.5%) zeros Zeros
newBalanceDest has 2439433 (38.34%) zeros Zeros
  • 1
  • 2

Variables


step

numerical

Approximate Distinct Count 743
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 97.1 MB
Mean 243.3972
Minimum 1
Maximum 743
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • step is skewed right (γ1 = 0.3752)

Quantile Statistics

Minimum 1
5-th Percentile 16
Q1 155
Median 238
Q3 334
95-th Percentile 483
Maximum 743
Range 742
IQR 179

Descriptive Statistics

Mean 243.3972
Standard Deviation 142.332
Variance 20258.39
Sum 1.5486e+09
Skewness 0.3752
Kurtosis 0.3291
Coefficient of Variation 0.5848
  • step has 102700 outliers

type

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 439.4 MB

Length

Mean 7.4224
Standard Deviation 0.532
Median 7
Minimum 5
Maximum 8

Sample

1st row PAYMENT
2nd row PAYMENT
3rd row TRANSFER
4th row CASH_OUT
5th row PAYMENT

Letter

Count 43589101
Lowercase Letter 0
Space Separator 0
Uppercase Letter 43589101
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (CASH_OUT, PAYMENT) take over 50.0%

amount

numerical

Approximate Distinct Count 5316900
Approximate Unique (%) 83.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 97.1 MB
Mean 179861.9035
Minimum 0
Maximum 9.2446e+07
Zeros 16
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • amount is skewed right (γ1 = 30.9939)

Quantile Statistics

Minimum 0
5-th Percentile 2387.1
Q1 13702.2908
Median 76838.4084
Q3 211719.86
95-th Percentile 549308.2176
Maximum 9.2446e+07
Range 9.2446e+07
IQR 198017.5692

Descriptive Statistics

Mean 179861.9035
Standard Deviation 603858.2315
Variance 3.6464e+11
Sum 1.1444e+12
Skewness 30.9939
Kurtosis 1797.9553
Coefficient of Variation 3.3573
  • amount is not normally distributed (p-value 4.233641290549694e-25)
  • amount has 329513 outliers

nameOrig

categorical

Approximate Distinct Count 6353307
Approximate Unique (%) 99.9%
Missing 0
Missing (%) 0.0%
Memory Size 458.0 MB

Length

Mean 10.4823
Standard Deviation 0.6041
Median 11
Minimum 5
Maximum 11

Sample

1st row C1231006815
2nd row C1666544295
3rd row C1305486145
4th row C840083671
5th row C2048537720

Letter

Count 6362620
Lowercase Letter 0
Space Separator 0
Uppercase Letter 6362620
Dash Punctuation 0
Decimal Number 60332420
  • nameOrig contains many words: 6353307 words

oldBalanceOrig

numerical

Approximate Distinct Count 1845844
Approximate Unique (%) 29.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 97.1 MB
Mean 833883.1041
Minimum 0
Maximum 5.9585e+07
Zeros 2102449
Zeros (%) 33.0%
Negatives 0
Negatives (%) 0.0%
  • oldBalanceOrig is skewed right (γ1 = 5.2491)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 14616
Q3 111571.9396
95-th Percentile 6.1468e+06
Maximum 5.9585e+07
Range 5.9585e+07
IQR 111571.9396

Descriptive Statistics

Mean 833883.1041
Standard Deviation 2.8882e+06
Variance 8.3419e+12
Sum 5.3057e+12
Skewness 5.2491
Kurtosis 32.9649
Coefficient of Variation 3.4636
  • oldBalanceOrig is not normally distributed (p-value 4.550236945155184e-25)
  • oldBalanceOrig has 1097486 outliers

newBalanceOrig

numerical

Approximate Distinct Count 2682586
Approximate Unique (%) 42.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 97.1 MB
Mean 855113.6686
Minimum 0
Maximum 4.9585e+07
Zeros 3609566
Zeros (%) 56.7%
Negatives 0
Negatives (%) 0.0%
  • newBalanceOrig is skewed right (γ1 = 5.1769)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 153104.3104
95-th Percentile 6.309e+06
Maximum 4.9585e+07
Range 4.9585e+07
IQR 153104.3104

Descriptive Statistics

Mean 855113.6686
Standard Deviation 2.924e+06
Variance 8.5501e+12
Sum 5.4408e+12
Skewness 5.1769
Kurtosis 32.067
Coefficient of Variation 3.4195
  • newBalanceOrig is not normally distributed (p-value 4.520356396680728e-25)
  • newBalanceOrig has 1023963 outliers

nameDest

categorical

Approximate Distinct Count 2722362
Approximate Unique (%) 42.8%
Missing 0
Missing (%) 0.0%
Memory Size 458.0 MB

Length

Mean 10.4818
Standard Deviation 0.6048
Median 11
Minimum 2
Maximum 11

Sample

1st row M1979787155
2nd row M2044282225
3rd row C553264065
4th row C38997010
5th row M1230701703

Letter

Count 6362620
Lowercase Letter 0
Space Separator 0
Uppercase Letter 6362620
Dash Punctuation 0
Decimal Number 60328785
  • nameDest contains many words: 2722362 words

oldBalanceDest

numerical

Approximate Distinct Count 3614697
Approximate Unique (%) 56.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 97.1 MB
Mean 1.1007e+06
Minimum 0
Maximum 3.5602e+08
Zeros 2704388
Zeros (%) 42.5%
Negatives 0
Negatives (%) 0.0%
  • oldBalanceDest is skewed right (γ1 = 19.9218)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 139449.64
Q3 968344.57
95-th Percentile 5.5743e+06
Maximum 3.5602e+08
Range 3.5602e+08
IQR 968344.57

Descriptive Statistics

Mean 1.1007e+06
Standard Deviation 3.3992e+06
Variance 1.1554e+13
Sum 7.0033e+12
Skewness 19.9218
Kurtosis 948.6734
Coefficient of Variation 3.0882
  • oldBalanceDest is not normally distributed (p-value 4.372274101070468e-25)
  • oldBalanceDest has 766669 outliers

newBalanceDest

numerical

Approximate Distinct Count 3555499
Approximate Unique (%) 55.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 97.1 MB
Mean 1.225e+06
Minimum 0
Maximum 3.5618e+08
Zeros 2439433
Zeros (%) 38.3%
Negatives 0
Negatives (%) 0.0%
  • newBalanceDest is skewed right (γ1 = 19.3523)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 221973.01
Q3 1.1393e+06
95-th Percentile 5.8821e+06
Maximum 3.5618e+08
Range 3.5618e+08
IQR 1.1393e+06

Descriptive Statistics

Mean 1.225e+06
Standard Deviation 3.6741e+06
Variance 1.3499e+13
Sum 7.7942e+12
Skewness 19.3523
Kurtosis 862.1558
Coefficient of Variation 2.9993
  • newBalanceDest is not normally distributed (p-value 4.398205978429803e-25)
  • newBalanceDest has 720143 outliers

isFraud

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 400.5 MB
  • The largest value (0) is over 773.7 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 1
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 6362620
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 773.7 times larger than the second largest value (1)
  • isFraud has words of constant length

isFlaggedFraud

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 400.5 MB
  • The largest value (0) is over 397662.75 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 6362620
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 397662.75 times larger than the second largest value (1)
  • isFlaggedFraud has words of constant length

Interactions

Correlations

Missing Values